3 Methodology

We constructed a dataset on which to estimate park activity location choices for a sample of observed trips in Alameda County, California. Alameda County is one of the seven counties that constitutes the San Francisco Bay Area metropolitan region in California. Alameda is the seventh most populous county in California with a population of 1.5 million (U.S. Census Bureau 2019), and has 14 incorporated cities and several unincorporated communities. It is an economically and ethnically diverse county and hence it was an attractive area to use for this study. The racial makeup of Alameda County was (49.7%) White, (11.2%) African American, (1.0%) Native American, (38.7%) Asian, (1.0%) Pacific Islander, and (22.4% Hispanic or Latino (of any race). Alameda County has a diverse set of parks, ranging from local small community parks, urban and transit-accessible parks like the Lake Merritt Recreational area, accessible coastal access, and suburban recreational areas like Lake Chabot.

3.1 Data

We constructed an analysis dataset from a publicly-available parks polygons layer, a commercially acquired passive device origin-destination table representing trips between the parks and home block groups, and American Community Survey data for the home block groups.

We obtained a polygons shapefile layer representing open spaces in Alameda County, California from the California Protected Areas Database (GreenInfo Network 2019). This dataset was selected because it included multiple different types of open space including local and state parks, traditional green spaces as well as wildlife refuges and other facilities that can be used for recreation. We removed facilities that did not allow open access to the public (such as the Oakland Zoo) and facilities whose boundaries conflated with freeway right-of-way – this prevents trips through the park from being conflated with park trips in the passive origin-destination data.

not_parks <- c("3455", "15886", "13243")
parks <- st_read(here("data/bayarea_parks.geojson"), quiet = TRUE) %>%
  filter(COUNTY == "Alameda") %>%
  transmute(
    id = as.character(UNIT_ID), name = UNIT_NAME, 
    access = factor(ACCESS_TYP, levels = c("Open Access", "No Public Access", 
                                           "Restricted Access")),
    acres = ACRES, type = DES_TP) %>%
  filter(!id %in% not_parks) %>%
  filter(access == "Open Access") %>%
  select(id, access, acres, type)  %>%
  st_transform(this_crs)

From this initial parks polygons dataset, we obtained park attribute information through OpenStreetMap (OSM) using the osmdata package for R (Padgham et al. 2017). Three attribute elements are considered in this analysis. First, we identify playgrounds using OSM facilities given a leisure = playground tag. The tagged facilities can be either polygon or point objects; in the former case we use the polygon centroid to determine the location of the playground.

playground_osm <- opq(bb) %>% # specify boundary for query
  add_osm_feature(key = "leisure", value = "playground") %>% # specify which kinds of data we want
  osmdata_sf() %>% # get a list of sf data frames for these tags
  trim_osmdata(bb, exclude = TRUE)

playgrounds <- rbind(
  # convert polygons to centroids
  playground_osm$osm_polygons %>%
    st_transform(this_crs) %>%
    st_centroid() %>%
    select(osm_id),
  # get points
  playground_osm$osm_points %>%
    st_transform(this_crs) %>%
    select(osm_id)
  # no multipolygons or polylines that need to be included
) %>%
  mutate(playground = TRUE)

Second, we consider sport fields of various kinds identified with the OSM leisure = pitch tag. This tag has an additional attribute describing the sport the field is designed for, which we retain in a consolidated manner. Soccer and American football fields are considered in a single category, and baseball and softball fields are combined as well. Basketball, tennis, and volleyball courts are kept as distinct categories, with all other sport-specific fields combined into a single “other.” Golf courses are discarded. As with playgrounds, polygon field and court objects are converted into points at the polygon centroid.

pitches_osm <- opq(bb) %>% # specify boundary for query
  add_osm_feature(key = "leisure", value = "pitch") %>% 
  osmdata_sf() %>% 
  trim_osmdata(bb, exclude = TRUE)

pitches <- rbind(
  # convert polygons to centroids
  pitches_osm$osm_polygons %>%
    st_transform(this_crs) %>%
    st_centroid() %>%
    select(osm_id, sport),
  # get points
  pitches_osm$osm_points %>%
    st_transform(this_crs) %>%
    select(osm_id, sport)
  # no multipolygons or polylines that need to be included
) %>%
  filter(sport != "golf") %>%
  mutate(
    sport = case_when(
      grepl("football", sport) | grepl("soccer", sport) ~ "football / soccer",
      grepl("baseball", sport) | grepl("softball", sport) ~ "baseball",
      grepl("basketball", sport) ~ "basketball",
      grepl("tennis", sport) ~ "tennis",
      grepl("volleyball", sport) ~ "volleyball",
      TRUE ~ "other"
    )
  )

Finally, we identified trails and footpaths using the path, cycleway, and footway values of the highway tag. A visual inspection of the resulting data revealed that the large preponderance of sidewalks and cycling trails within parks in Alameda County are identified properly with these variables. Trails are represented in OSM as polylines, or as polygons if they form a complete circle. In the latter case, we converted the polygon boundary into an explicit polyline object.

trails_osm <- opq(bb) %>% # specify boundary for query
  add_osm_feature(key = "highway", value = c("footway", "cycleway", "path")) %>% 
  osmdata_sf() %>% 
  trim_osmdata(bb, exclude = FALSE)

trails <- rbind(
  # points are nodes, so we can skip them.
  # lines
  trails_osm$osm_lines %>%
    st_transform(this_crs) %>%
    select(osm_id, bicycle, foot, horse),
  
  # circular paths get converted to polygons, so we need to 
  # turn them back into lines.
  trails_osm$osm_polygons %>%
    st_transform(this_crs) %>%
    st_boundary() %>%
    select(osm_id, bicycle, foot, horse)
) %>%
  mutate(length = st_length(.))

It is possible for each of these facilities to exist outside the context of a public park. For example, many private apartment complexes have playgrounds and high schools will have sports facilities that are not necessarily open to the general public. We spatially matched the OSM amenities data and retained only those facilities that intersected with the CPAD open spaces database identified earlier.

# Append attributes to park frame ===============
parks_with_playgrounds <-  parks %>%  
  # compute yeo-johnson transformation on acres
  # determine if a playground point is inside the park polygon boundary
  st_join(playgrounds %>% select(playground), st_intersects)  %>%
  # a park with multiple playgrounds will join multiple times, so we need to 
  # summarize them back down.
  group_by(id) %>% slice(1) %>% ungroup() %>%
  mutate(playground = ifelse(is.na(playground), F, playground)) %>%
  select(id, playground) %>%
  st_set_geometry(NULL)
  
# determine if the pitch points are inside the park polygon boundaries
parks_with_pitches <- parks %>%
  select(id) %>%
  st_join(pitches %>% select(sport)) %>%
  st_set_geometry(NULL) %>%
  # multiple pitches in a park result in repeated rows. 
  group_by(id, sport) %>% summarise(count = n()) %>%
  pivot_wider(id_cols = id, values_from = count, names_from = sport, 
              values_fill = FALSE) %>%
  mutate(
    pitch = ifelse(`NA` == 1, FALSE, TRUE),
    across(baseball:tennis, ~ifelse(. > 0, TRUE, FALSE)),
  ) %>%
    select(-`NA`)

# determine which walking paths are inside park polgon boundaries
parks_with_trails <- parks %>% 
  # determine if the trails are inside the park polygon boundaries
  st_join(trails %>% select(trail = osm_id), st_intersects) %>%
  # multiple pitches in a park result in repeated rows. 
  group_by(id) %>% slice(1) %>% ungroup() %>%
  mutate(trail = ifelse(is.na(trail), F, T)) %>%
  select(id, trail) %>%
  st_set_geometry(NULL)


# join together and write out =================
attributed_parks <- parks %>%
  # compute yeo-johnson transformation on acres
  mutate(yj_acres = VGAM::yeo.johnson(acres, lambda = 0)) %>%
  left_join(parks_with_playgrounds, by = "id") %>%
  left_join(parks_with_pitches,     by = "id") %>%
  left_join(parks_with_trails,      by = "id") 
if(knitr::is_latex_output()) {
  alameda_back <- get_googlemap(c(lon = -122.1300334, lat = 37.69427), 
                                zoom = 9, maptype = "terrain", color = "bw") 
  plot(st_transform(parks, 3857)[,"access"], bgMap = alameda_back)
} else {
  leaflet() %>%
    addProviderTiles(providers$Esri.WorldGrayCanvas) %>%
    addPolygons(data = parks %>% st_transform(4326), group = "Parks", color = "green") %>%
    addCircleMarkers(data = playgrounds %>% st_transform(4326), 
                     group = "Playgrounds", color = "blue", radius = 2) %>%
    addCircleMarkers(data = pitches %>% st_transform(4326), 
                     group = "Sport Fields", color = "red", radius = 2) %>%
    addPolylines(data = trails %>% st_transform(4326), 
                 group = "Trails", color = "black") %>%
    addLayersControl(overlayGroups = c("Parks", "Playgrounds", "Sport Fields", "Trails"))
}

Figure 3.1: Location of parks in Alameda County.

We provided the park boundaries layer to a commercial firm, StreetLight Data Inc., which develops and resells origin-destination matrices derived from passive device location data. The provider employs a proprietary data processing engine (called Route Science) to algorithmically transform observed device location data points (the provider uses in-vehicle GPS units and mobile device LBS) over time into contextualized, normalized, and aggregated travel patterns. From these travel patterns, the Route Science processing algorithms infer likely home Census block group locations for composite groups of people and converts raw location data points into trip origin and destination points (Pan et al. 2006; Friedrich et al. 2010).

park_flows <- read_csv(here("data/Master_BlockGroups_Final071519.csv"), 
                       col_types = cols()) %>% 
  # need to make block group id properly formatted.
  transmute(
    park = as.character(`Zone Name`),
    home = paste0("0", round(`Block Group ID`)),
    county = substr(home, 0, 5),
    attractions = `Zone Traffic (StL Index)`,
    d_share = `Percent by Home Location`
  ) %>%
  
  # only keep bgs in the area
  filter(county %in% c("06001")) %>%
  # and only keep parks that we are looking at
  filter(park %in% parks$id) %>%
  
  mutate(flow = attractions * d_share) %>%
  group_by(home) %>%
  mutate(
    productions = sum(flow),
    o_share = flow / productions
  ) 

For each park polygon, the firm returned a population-weighted estimate of how many devices were observed by home location block group over several months in the period between May 2018 and October 2018. We transformed this table such that it represented the weighted unique devices traveling between block groups and parks. We discarded home location block groups outside of Alameda County; the imputed home locations can be far away from the study area for a small amount of trips and are unlikely to represent common or repeated park activities.

attributed_parks <- attributed_parks %>% group_by(type) %>%
  left_join(park_flows %>% group_by(park) %>% slice(1) %>% 
              select(park, attractions), by = c("id" = "park")) %>%
  mutate(attractions = ifelse(is.na(attractions), 0, attractions)) 

Table presents descriptive statistics on the 500 parks assembled for this study, grouped according to the park type as defined on CPAD. A little more than half of the parks have identified paths, while around 40% of the identified parks have playgrounds and sport fields.

park_attributes_data <- datasummary_balance(
  ~ type,  
  data = attributed_parks %>% 
    select(Access = access, Acres = acres, type, Playground = playground, 
           `Any Sport Field` = pitch, 
           `Football / Soccer` = `football / soccer`,
           `Baseball` = baseball, `Basketball` = basketball,
           `Tennis` = tennis, `Volleyball` = volleyball,
           Trail = trail, 
           `Mobile Devices` =attractions), 
  dinm = FALSE, caption = "Park Summary Statistics", booktabs = TRUE)

if(knitr::is_latex_output()){
  park_attributes_data %>% kableExtra::kable_styling(table.envir = "sidewaystable")
} else {
  park_attributes_data
}
Table 3.1: Park Summary Statistics
Local Park (N=441)
Local Recreation Area (N=57)
State Recreation Area (N=2)
Mean Std. Dev. Mean Std. Dev. Mean Std. Dev.
Acres 59.8 370.5 115.2 509.6 421.0 271.7
Mobile Devices 1450.0 6685.4 2749.5 6250.4 80.5 113.8
N % N % N %
Access Open Access 441 88.2 57 11.4 2 0.4
No Public Access 0 0.0 0 0.0 0 0.0
Restricted Access 0 0.0 0 0.0 0 0.0
Playground FALSE 225 45.0 42 8.4 2 0.4
TRUE 216 43.2 15 3.0 0 0.0
Any Sport Field FALSE 277 55.4 37 7.4 2 0.4
TRUE 164 32.8 20 4.0 0 0.0
Football / Soccer FALSE 415 83.0 49 9.8 2 0.4
TRUE 26 5.2 8 1.6 0 0.0
Baseball FALSE 364 72.8 43 8.6 2 0.4
TRUE 77 15.4 14 2.8 0 0.0
Basketball FALSE 342 68.4 50 10.0 2 0.4
TRUE 99 19.8 7 1.4 0 0.0
Tennis FALSE 387 77.4 51 10.2 2 0.4
TRUE 54 10.8 6 1.2 0 0.0
Volleyball FALSE 434 86.8 55 11.0 2 0.4
TRUE 7 1.4 2 0.4 0 0.0
Trail FALSE 156 31.2 22 4.4 0 0.0
TRUE 285 57.0 35 7.0 2 0.4

In order to understand the demographic makeup of the home block groups and potentially the characteristics of the people who make each trip, we obtained 2013-2017 five-year data aggregations from the American Community Survey using the tidycensus (Walker 2019) interface to the Census API for several key demographic and built environment variables: the share of individuals by ethnic group, the share of households by income level, household median income, and the housing unit density. An important attribute of the choice model is the distance from the home block group to the park boundary. Because we have no information on where in the block group a home is actually located, we use the population-weighted block group centroid published by the Census Bureau as the location for all homes in the block group. We then measured the network-based distance between the park and the home block group centroid using a walk network derived from OpenStreetMap using the networkx package for Python (Hagberg, Schult, and Swart 2008),

variables <- c(
  "population" = "B25008_001", # TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY TENURE
  "housing_units" = "B25001_001", # HOUSING UNITS
  "households" = "B19001_001", #HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS)
  #Estimate!!Total!!Female!!Worked in the past 12 months!!Usually worked 35 or more hours per week
  # RACE
  "black" = "B02001_003",
  "asian" = "B02001_005",
  "pacific" = "B02001_006",
  "nativeam" = "B02001_004",
  # HISPANIC OR LATINO ORIGIN BY SPECIFIC ORIGIN
  # The number of hispanic individuals needs to be drawn from a different table.
  "hispanic" = "B03001_003",
  #MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS)
  "income" = "B19013_001",
  #HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS)
  "inc_0010" = "B19001_002",  "inc_1015" = "B19001_003", "inc_1520" = "B19001_004",
  "inc_2025" = "B19001_005", "inc_2530" = "B19001_006", "inc_3035" = "B19001_007",
  "inc_125"  = "B19001_015", "inc_150"  = "B19001_016", "inc_200"  = "B19001_017"
)

acs <- get_acs(geography = "block group", variables = variables, year = 2017,
               state = "CA", county = "001", geometry = TRUE) %>%
  select(-moe) %>%
  spread(variable, estimate) %>%
  # area is in m^2, change to km^2
  mutate(area = as.numeric(st_area(geometry) * 1e-6)) %>%
  transmute(
    geoid = GEOID,
    group = 1,
    population, households, housing_units, 
    density = households / area,
    income,
    # many of the variables come in raw counts, but we want to consider
    # them as shares of a relevant denominator.
    lowincome    = 100 * (inc_0010 + inc_1015 + inc_1520 + inc_2530 +
                            inc_3035) / households,
    highincome   = 100 * (inc_125 + inc_150 + inc_200) / households,
    black        = 100 * black / population,
    asian        = 100 * asian / population,
    other        = 100 * (nativeam + pacific) / population,
    white_hisp   = 100 - black - asian - other
  ) %>%
  filter(population > 0)
# distances computed in py/shortest_paths.py
path_files <- dir("data/shortest_paths/", full.names = TRUE)
distance_df <- lapply(path_files, function(file) {
  read_csv(file, col_types = list(geoid = col_character(), park_id = col_character())) %>%
  mutate(
    yj_distance = VGAM::yeo.johnson(distance, lambda = 0),
    yj_euc_dist = VGAM::yeo.johnson(euc_dist, lambda = 0)
  )
}) %>%
  bind_rows() %>%
  rename(home = geoid, park = park_id) %>%
  mutate(home = sprintf("%012s", home))
acs_for_table <- acs %>% select(
    "Density: Households per square kilometer" = density,
    "Income: Median tract income" = income,
    "Low Income: Share of households making less than $35k" = lowincome,
    "High Income: Share of households making more than $125k" = highincome,
    "Black: Share of population who is Black" = black,
    "Asian: Share of population who is Asian" = asian,
    "Other: Share of population who belong to other minority groups" = other)

if(knitr::is_latex_output()){ 
  datasummary_skim(acs_for_table, title = "Block Group Summary Statistics",
    booktabs = TRUE, histogram = FALSE, output = "latex") %>%
    kableExtra::column_spec(1, width = "3cm")
} else {
  datasummary_skim( acs_for_table, title = "Block Group Summary Statistics")
}
Table 3.2: Block Group Summary Statistics
Unique (#) Missing (%) Mean SD Min Median Max
Density: Households per square kilometer 1040 0 1714.5 1527.2 0.8 1352.9 19490.0
Income: Median tract income 971 3 93246.3 44900.3 9821.0 85673.0 250001.0
Low Income: Share of households making less than $35k 979 0 18.4 14.0 0.0 15.1 91.7
High Income: Share of households making more than $125k 1004 0 32.5 20.3 0.0 30.4 100.0
Black: Share of population who is Black 927 0 12.5 14.8 0.0 6.8 105.3
Asian: Share of population who is Asian 1010 0 27.8 32.3 0.0 21.1 610.4
Other: Share of population who belong to other minority groups 612 0 1.5 2.5 0.0 0.5 19.7

3.2 Model

In random utility choice theory, if an individual living in block group \(n\) wishes to make a park trip, the probability that the individual will choose park \(i\) from the set of all parks \(J\) can be described as a ratio of the park’s measurable utility \(V_{ni}\) to the sum of the utilities for all parks in the set. In the common destination choice framework we apply a multinomial logit model (McFadden 1974, @Recker1978), \[\begin{equation}\label{eq:p} P_{ni} = \frac{\exp(V_{ni})}{\sum_{j \in J}\exp(V_{nj})} \end{equation}\] where the measurable utility \(V_{ni}\) is a linear-in-parameters function of the destination attributes. \[\begin{equation}\label{eq:V} V_{ni} = X\beta \end{equation}\] where \(\beta\) is a vector of estimable coefficients giving the relative utility (or disutility) of that attribute to the choice maker, all else equal. It is possible to add amenities of the park or the journey to the utility equation. However, as the number of alternatives is large, it is impractical to consider alternative-specific constants or coefficients and therefore not possible to include attributes of the home block group or traveler \(n\) directly. We can, however, segment the data and estimate different distance and size parameters for different segments to observe heterogeneity in the utility parameters between different socioeconomic groups.

The logarithm of the sum in the denominator of Equation (called the logsum) provides a measure of the consumer surplus of the choice set (Williams 1977), \[\begin{equation}\label{eq:logsum} CS_n = \ln{{\sum_{j \in J}\exp(V_{nj})}} + C \end{equation}\] where \(C\) is a constant indicating an unknown absolute value. But comparing the relative logsum values across choice makers, \(CS_n - CS_{n-1}\) gives an indication of which choice maker has a more valuable choice set. Or, in this case of a park destination choice model, which choice maker has better access to parks. Such a “utility-based” accessibility term is thus continuously defined, dervied directly from choice theory, and can contain multiple dimensions of the attributes of the choice (Handy and Niemeier 1997; Dong et al. 2006).

set.seed(42)
n_obs <- 20000
n_alts <- 10
n_flow <- sum(park_flows$flow)
ll0e <- sum(n_obs *.8 * log(1/n_alts))
mydata <- park_flows %>%
  ungroup() %>%
  mutate(weight = flow / sum(flow)) %>%
  sample_n(n_obs, replace = TRUE, weight = weight) %>%
  transmute(id = row_number(), home, alt_0 = park, 
            validation = sample(c(TRUE,FALSE), n(), TRUE, prob = c(0.2, 0.8)))

In the most typical cases, researchers estimate the utility coefficients for destination choice models from household travel surveys. As we have no knowledge of an appropriate survey on park access, we need to synthesize a suitable estimation data set. We do this by sampling 210^{4} random discrete device origin-destination pairs from the commercial passive data matrix, weighted by the volume of the flows. This corresponds to a 4.3% sample of all the observed device origin-destination pairs.

The sampled origin-destination pair gives the home location as well as the “chosen” alternative for a synthetic person. In principle the individual’s choice set contains all the parks in our dataset; in practice it can be difficult to estimate choice models with so many alternatives (\(|J| = 500\)). For this reason we randomly sample 10 additional parks to serve as the non-chosen alternatives for our synthetic choice maker. Such random sampling of alternatives reduces the efficiency of the estimated coefficients but the coefficients remain unbiased (Train 2009). As the model has no alternative-specific constants, the standard likelihood comparison statistic against the market shares model \(\rho^2\) is not computable. We instead use the likelihood comparison against the equal shares model \(\rho_0^2\).

sampled_parks <- lapply(1:n_obs, function(i){
  sample(attributed_parks$id, n_alts)
}) %>%
  unlist() %>%
  matrix(ncol = n_alts) %>%
  as_tibble(.name_repair = ~ str_c("alt", 1:n_alts, sep = "_"))

logitdata <- mydata %>%
  bind_cols(sampled_parks) %>%
  gather(key = "alt", value = "park", -id, -home, -validation) %>%
  mutate(chosen = alt == "alt_0") %>%
  arrange(id, alt) %>%
  
  # append distances
  inner_join(distance_df, by = c("home", "park")) %>%
  
  # append park attributes
  left_join(attributed_parks, by = c("park" = "id")) %>%
  
  # append block group attributes
  left_join(acs %>% select(geoid, density:other) %>% st_set_geometry(NULL),
            by = c("home" = "geoid")
   
)

write_rds(logitdata, "data/logitdata.rds")

The resulting analysis dataset therefore contains 210^{4} choice makers that select between 11 parks including the park they were observed to choose; the measured distance between the choice maker’s block group and all parks in the choice set; and the acreage of each park in the choice set. We hold out a random sample of approximately 20% of choice makers for validation purposes. We use the mlogit package for R (Croissant 2019; R Core Team 2020) to estimate the multinomial logit models.

References

Croissant, Yves. 2019. mlogit: Multinomial Logit Models. https://cran.r-project.org/package=mlogit.

Dong, Xiaojing, Moshe E. Ben-Akiva, John L. Bowman, and Joan L. Walker. 2006. “Moving from trip-based to activity-based measures of accessibility.” Transportation Research Part A: Policy and Practice 40 (2): 163–80. https://doi.org/10.1016/J.TRA.2005.05.002.

Friedrich, Markus, Katrin Immisch, Prokop Jehlicka, Thomas Otterstätter, and Johannes Schlaich. 2010. “Generating Origin-Destination Matrices from Mobile Phone Trajectories.” Transportation Research Record: Journal of the Transportation Research Board 2196: 93–101. https://doi.org/10.3141/2196-10.

GreenInfo Network. 2019. “California Protected Areas Database.” https://www.calands.org/cpad/.

Hagberg, Aric A., Daniel A. Schult, and Pieter J. Swart. 2008. “Exploring Network Structure, Dynamics, and Function Using Networkx.” In Proceedings of the 7th Python in Science Conference, edited by Gaël Varoquaux, Travis Vaught, and Jarrod Millman, 11–15. Pasadena, CA USA.

Handy, S L, and D A Niemeier. 1997. “Measuring Accessibility: An Exploration of Issues and Alternatives.” Environment and Planning A 29 (7): 1175–94. https://doi.org/10.1068/a291175.

McFadden, Daniel L. 1974. “Conditional Logit Analysis of Qualitative Choice Behavior.” In Frontiers in Econometrics, edited by Paul Zarembka, 105–42. New York: Academic Press.

Padgham, Mark, Bob Rudis, Robin Lovelace, and Maëlle Salmon. 2017. “Osmdata.” The Journal of Open Source Software 2 (14). https://doi.org/10.21105/joss.00305.

Pan, Changxuan, Jiangang Lu, Shan Di, and Bin Ran. 2006. “Cellular-Based Data-Extracting Method for Trip Distribution.” Transportation Research Record: Journal of the Transportation Research Board 1945 (1): 33–39. https://doi.org/10.1177/0361198106194500105.

R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Train, Kenneth E. 2009. Discrete Choice Methods with Simulation, 2nd Edition. Cambridge: Cambridge University Press. https://doi.org/10.1016/S0898-1221(04)90100-9.

U.S. Census Bureau. 2019. “QuickFacts: Alameda County, California.” https://www.census.gov/quickfacts/fact/table/alamedacountycalifornia/PST045218.

Walker, Kyle. 2019. tidycensus: Load US Census Boundary and Attribute Data as ’tidyverse’ and ’sf’-Ready Data Frames. https://cran.r-project.org/package=tidycensus.

Williams, H C W L. 1977. “On the Formation of Travel Demand Models and Economic Evaluation Measures of User Benefit.” Environment and Planning A: Economy and Space 9 (3): 285–344. https://doi.org/10.1068/a090285.